93 research outputs found
Data analysis in software engineering: an approach to incremental data-driven effort estimation.
Cost and effort estimation in software projects have been investigated for several years. Nonetheless, compared to other engineering fields, there is still a large number of projects that fail into different phases due to prediction errors. On average, large IT projects run 45 percent over budget and seven percent over time, while delivering 56 percent less value than predicted. Several effort estimation models have been defined in the past, mainly based on user experience or on data collected in previous projects, but no studies support an incremental effort estimation and tracking. Iterative development techniques, and in particular Agile techniques, partially support the incremental effort estimation, but due to the complexity of the estimation, the total effort always tend to be higher than expected. Therefore, this work focuses on defining an adequate incremental and data driven estimation model so as to support developers and project managers to keep track of the remaining effort incrementally. The result of this work is a set of estimation models for effort estimation, based on a set of context factors, such as the domain of application developed, size of the project team and other characteristics. Moreover, in this work we do not aim at defining a model with generic parameters to be applied in similar context, but we define a mathematical approach so as to customize the model for each development team.
The first step of this work focused on analysis of the existing estimation models and collection of evidence on the accuracy of each model. We then defined our approach based on Ordinary Least Squares regression analysis (OLS)so as to investigate the existence of a correlation between the actual effort and other characteristics. While building the OLS models we analyzed the data set and removed the outliers to prevent them from unduly influencing the OLS regression lines obtained. In order to validate the result we apply a 10-fold cross-validation assessing the accuracy of the results in terms of R2, MRE and MdMRE. The model has been applied to two different case studies. First, we analyzed a large number of projects developed by means of the waterfall process. Then, we analyzed an Agile process, so as to understand if the developed model is also applicable to agile methodologies. In the first case study we want to understand if we can define an effort estimation model to predict the effort of the next development phase based on the effort already spent. For this reason, we investigated if it is possible to use:
\u2022 the effort of one phase for estimating the effort of the next development phase
\u2022 the effort of one phase for estimating the remaining project effort
\u2022 the effort spent up to a development phase to estimate its effort
\u2022 the effort spent up to a development phase to estimate the remaining project effort
Then, we investigated if the prediction accuracy can be improved considering other common context factors such as project domain, development language, development platform, development process, programming language and number of Function Points. We analyzed projects collected in the ISBSG dataset and, considering the different context factors available, we run a total of 4500 analysis, to understand which are the more suitable factors to be applied in a specific context. The results of this first case study show a set of statistically significant correlations between: (1) the effort spent in one phase and the effort spent in the following one; (2) the effort spent in a phase and the remaining effort; (3) the cumulative effort up to the current phase and the remaining effort. However, the results also show that these estimation models come with different degrees of goodness of fit. Finally, including further information, such as the functional size, does not significantly improve estimation quality. In the second case study, a project developed with an agile methodology (SCRUM) has been analyzed. In this case, we want to understand if is possible to use our estimation approach, so as to help developers to increase the accuracy of the expert based estimation.
SCRUM, effort estimations are carried out at the beginning of each sprint, usually based on story points. The usage of functional size measures, specifically selected for the type of application and development conditions, is expected to allow for more accurate effort estimates. The goal of the work presented here is to verify this hypothesis, based on experimental data. The association of story measures to actual effort and the accuracy of the resulting effort model is evaluated. The study shows that developers\u2019 estimation is more accurate than those based on functional measurement. In conclusion, our study shows that, easy to collect functional measures do not help developers in improving the accuracy of the effort estimation in Moonlight SCRUM. These models derived in our work can be used by project managers and developers that need to estimate or control the project effort in a development process. These models can also be used by the developers to track their performances and understand the reasons of effort estimation errors. Finally the model help project managers to react as soon as possible and reduce project failures due to estimation errors. The detailed results are reported in the next sections as follows:
\u2022 Chapter 1 reports the introduction to this work
\u2022 Chapter 2 reports the related literature review on effort estimation techniques
\u2022 Chapter 3 reports the proposed effort estimation approach
\u2022 Chapter 4 describe the application of our approach to Waterfall process
\u2022 Chapter 5 describe the application of our approach to SCRUM
\u2022 Chapter 6 reports the conclusion and the future work
Open Tracing Tools: Overview and Critical Comparison
Background. Coping with the rapid growing complexity in contemporary software
architecture, tracing has become an increasingly critical practice and been
adopted widely by software engineers. By adopting tracing tools, practitioners
are able to monitor, debug, and optimize distributed software architectures
easily. However, with excessive number of valid candidates, researchers and
practitioners have a hard time finding and selecting the suitable tracing tools
by systematically considering their features and advantages. Objective. To such
a purpose, this paper aims to provide an overview of the popular tracing tools
on the market via a critical comparison. Method. Herein, we first identified 11
tools in an objective, systematic, and reproducible manner adopting the
Systematic Multivocal Literature Review protocol. Then, we characterized each
tool looking at the 1) measured features, 2) popularity both in peer-reviewed
literature and online media, and 3) benefits and issues. Results. As a result,
this paper presents a systematic comparison amongst the selected tracing tools
in terms of their features, popularity, benefits and issues. Conclusion. Such a
result mainly shows that each tracing tool provides a unique combination of
features with also different pros and cons. The contribution of this paper is
to provide the practitioners better understanding of the tracing tools
facilitating their adoption
Technical Debt Prioritization: State of the Art. A Systematic Literature Review
Background. Software companies need to manage and refactor Technical Debt
issues. Therefore, it is necessary to understand if and when refactoring
Technical Debt should be prioritized with respect to developing features or
fixing bugs. Objective. The goal of this study is to investigate the existing
body of knowledge in software engineering to understand what Technical Debt
prioritization approaches have been proposed in research and industry. Method.
We conducted a Systematic Literature Review among 384 unique papers published
until 2018, following a consolidated methodology applied in Software
Engineering. We included 38 primary studies. Results. Different approaches have
been proposed for Technical Debt prioritization, all having different goals and
optimizing on different criteria. The proposed measures capture only a small
part of the plethora of factors used to prioritize Technical Debt qualitatively
in practice. We report an impact map of such factors. However, there is a lack
of empirical and validated set of tools. Conclusion. We observed that technical
Debt prioritization research is preliminary and there is no consensus on what
are the important factors and how to measure them. Consequently, we cannot
consider current research conclusive and in this paper, we outline different
directions for necessary future investigations
Does Cyclomatic or Cognitive Complexity Better Represents Code Understandability? An Empirical Investigation on the Developers Perception
Background. Code understandability is fundamental. Developers need to clearly
understand the code they are modifying. A low understandability can increase
the amount of coding effort and misinterpretation of code has impact on the
entire development process. Ideally, developers should write clear and
understandable code with the least possible effort. Objective. The goal of this
work is to investigate if the McCabe Cyclomatic Complexity or the Cognitive
Complexity can be a good predictor for the developers' perceived code
understandability to understand which of the two complexities can be used as
criteria to evaluate if a piece of code is understandable. Method. We designed
and conducted an empirical study among 216 junior developers with professional
experience ranging from one to four years. We asked them to manually inspect
and rate the understandability of 12 Java classes that exhibit different levels
of Cyclomatic and Cognitive Complexity. Results. Cognitive Complexity slightly
outperforms the Cyclomatic Complexity to predict the developers' perceived
understandability. Conclusion. The identification of a clear and validated
measure for Code Complexity is still an open issue. Neither the old fashioned
McCabe Cyclomatic Complexity and the most recent Cognitive Complexity are good
predictors for code understandability, at least when considering the complexity
perceived by junior developers
- …